Principal  Investigator  Naime:  Jeffrey  D.  UUman 
PI  Institution:  Stanford  University 
PI  Phone  Number:  (415)  723-9745 


AD-A257  946 

liiiiliiii 


PI  Email  Address:  ullman@cs.stanford.edu 

Contract  Title:  Design  and  Implementation  of  Parallel  Algorithms 
Contract  Number:  N00014-88-K-0166 
Reporting  Period:  10/90-9/91 


1.  NUMERICAL  PRODUCTIVITY  MEASURES 

Refereed  papers,  submitted,  not  published:  11 

Refereed  papers  published:  4 

Unrefereed  reports  and  articles:  3 

Books  or  parts  submitted,  not  published:  0 

Books  or  parts  published:  2 

Patents  filed,  not  granted:  1 

Patents  granted:  0 

Invited  presentations:  6 

Contributed  presentations:  6 

Honors  received:  2 

Prizes  or  awards  received:  0 

Promotions  obtained:  0 

Graduate  students  supported:  3 

Postdocs  supported:  2 

Minorities  supported:  0 


r 


DTIC 

ELECTE 


NOVI  9 1992 


33  y 


WSTRainiUN  OTATEMENT  ^ 

.  Approved  fox  public  zeleoM)  i 


92-29841 


a 


SO- 


«?}■, 


Principail  Investigator  Name;  Jeffrey  D.  Uliman 
PI  Institution:  Stanford  University 
PI  Phone  Number:  (415)  723-9745 
PI  Email  Address:  ullman@cs.stanford.edu 

Contract  Title:  Design  and  Implementation  of  Parallel  Algorithms 
Contract  Number:  N00014-88-K-0166 
Reporting  Period:  10/90-9/91 

2.  DETAILED  SUMMARY  OF  TECHNICAL  RESULTS 
Coupled  Execute/Control  Processor  Architecture 
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Andy  Freeman  has  designed  and  simulated  a  new  processor  architecture  for  use  in  mul¬ 
tiprocessors.  The  basic  problem  is  designing  processors  that  can  cope  with  long  memory 
latency.  His  approach  is  to  couple  two  processors  by  two-waj'^  queues;  neither  processor  has 
the  capability  associated  with  conventional  processors.  One  executes  only  “straight-line 
code,”  i.e.,  computations,  and  the  second  executes  only  tests  and  branches.  Through  sim¬ 
ulation  he  has  shown  the  superiority  of  this  architecture  on  applications  that  axe  simil2tr  to 
“vectorizable”  computations,  including  those  on  which  vector  architectures  perform  well 
and  some  that  are  (slightly)  too  sequential  for  vector  processors.  No  reports  on  this  work 
have  yet  been  produced. 


Timestamping  in  Networks 

Orli  Waarts  and  Cynthia  Dwork  of  IBM  have  looked  at  the  question  of  assigning  times¬ 
tamps  in  an  asynchronous  network  of  processors  (Dwork  and  Waarts  [1991]).  The  model 
is  that  of  shared  memory,  where  the  processors  communicate  through  a  memory  accessible 
to  all;  the  memory,  like  the  processors,  may  be  distributed.  The  problem  is  to  issue  tickets 
in  a  first-come-first  served  order  when  there  are  unlimited  numbers  of  requests  for  service 
that  may  arrive  asynchronously.  Their  solution  uses  time  proportional  to  the  number  of 
processors  involved,  while  previous  solutions  were  superlinear. 

A  key  problem  to  solve  is  avoiding  a  bottleneck  at  one  memory  location.  Obvious 
approaches  have  the  processors  queue  up  for  access  to  a  counter  that  assigns  timestamps. 
The  solution  of  Dwork  and  Waarts  uses  instead  a  bounded  timestamp,  which  is  in  the  form 
of  a  vector  of  values,  one  for  each  processor.  Each  processor  controls  one  component  of 
the  vector.  In  particular,  we  must  be  able  to  recycle  timestamps  so  we  do  not  confuse  a 
new  value  with  an  old,  identical  value.  We  avoid  this  problem  by  allowing  each  processor 
to  designate  which  of  its  values  comes  first;  the  others  follow  in  a  fixed,  cyclic  order. 

Linearizable  Counting 

A  related  problem,  called  “linearizable  coimting,”  assigns  consecutive  integers  to  processors 
(timestamping  assigns  tokens,  which  might  be  integers,  to  processes)  in  a  first-come-first 
served  order.  The  algorithm  of  Herlihy,  Shavit,  and  Waarts  [1991]  performs  this  assignment 
in  time  proportional  to  the  number  of  processors,  and  does  so  in  a  way  that  ;does  not 
produce  a  bottleneck  at  a  shared  memory  location. 
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Processor  Assignment 


Yossi  Azar  and  Joseph  Naor,  with  R.  Rom  of  IBM,  looked  at  the  problem  of  on-line 
assignment  of  tzuks  to  processors  (Azax,  Naor,  and  Rom  [1991]).  That  is,  a  sequence  of 
tasks  enter  the  system  asynchronously,  and  each  can  be  performed  by  any  of  a  subset  of 
the  n  processors.  We  must  assign  tasks  to  processors  as  they  enter,  and  the  object  is  to 
balance  the  load  on  the  processors.  They  offer  an  algorithm  that  can  be  no  worse  than  a 
factor  iog2  n,  compared  with  a  “clairvoyant”  algorithm  that  can  predict  future  demand  for 
the  various  processors.  They  also  show  this  ratio  is  best  possible.  Further,  they  consider 
the  use  of  a  randomized  algorithm  for  the  same  assignment  problem,  and  show  that  a  ratio 
of  logg  n  is  suiHcient  and  best-possible. 

Derandomization  of  Algorithms 

Many  of  the  best  known  parallel  algorithms  are  probabilistic,  in  the  sense  that  they  perform 
well  with  high  probability,  but  there  is  no  guarantee  they  will  finish  in  any  pEU-ticular 
amount  of  time.  In  many  cases,  it  is  possible  to  replace  a  probabilistic,  parallel  algorithm 
by  a  deterministic  parallel  algorithm  by  discovering  a  small  set  of  bit  strings  that  can 
represent  possible  sequences  of  “coin  flips”  (the  steps  that  introduce  the  randomization). 
It  is  necessary  that  these  small  sets  have  certain  properties  of  pseudo- randonmess,  in  the 
sense  that  on  any  input  there  is  at  least  one  among  them  on  which  the  probabilistic 
algorithm  will  terminate  relatively  quickly. 

Azar,  Motwani,  and  Naor  [1991]  address  the  problem  of  approximating  arbitrary  joint 
distributions  of  random  variables.  They  construct,  for  any  c  >  0,  a  set  of  strings  that 
is  of  size  polynomial  in  both  the  nvimber  of  variables  and  1/e,  whose  joint  distribution 
approximates  that  of  the  given  distribution,  in  the  sense  that  none  of  the  coefficients  of 
the  Fourier  transform  of  the  two  distributions  differ  by  more  than  c.  This  result  can  be 
used  in  two  ways.  One,  it  is  possible  to  run  the  original  random  algorithm  using  each  of 
the  strings  of  “coin  flips”  in  the  set,  in  paralUel,  terminating  as  soon  eis  the  first  among 
them  terminates.  Second,  it  allows  one  to  replace  a  large  number  of  coin  flips  by  a  smaller 
number  (the  logarithm  of  the  size  of  the  set)  *hat  is  “almost  as  random”  as  the  original. 

Bipartite  Matching 

Andy  Goldberg  and  Serge  Plotkin  looked  at  the  open  question  of  sublinear  (NC)  algorithms 
for  parallel  bipMXtite  matching  (Fisher,  Goldberg,  and  Plotkin  [1991]).  This  problem,  in 
addition  to  being  one  of  the  most  important  of  the  classical  combinatorial  problems,  has 
special  significance  for  parallel  computation  because  a  number  of  other  important  problems, 
such  as  depth-first  search,  are  known  to  have  NC  algorithms  if  bipartite  matching  does. 
They  show  is  that  there  is  an  NC  algorithm  for  a  strong  form  of  approximate  matching, 
where,  with  different  versions  of  the  algorithm,  they  can  guarantee  a  matching  that  comes 
within  1  —  f  of  the  maximum  match  for  any  c  >  0. 

Earlier  work  on  using  interior  point  methods  in  parallel  algorithms  has  appeWed  in  a 
journal  (Goldberg,  Plotkin,  Shmoys,  and  Tardos  [1991]). 
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Network  Flow  Problems 

A  survey  of  parallel  algorithms  for  network  flow  was  published  (Goldberg,  Tardos,  and 
Tarjan  [1990]).  Also,  earlier  work  on  reducing  the  number  of  processors  needed  to  solve 
network  flow  problems  in  parallel  appeared  in  a  journal  (Goldberg  [1991]). 

Journal  Publication  of  Old  Results 

Several  other  works,  reported  last  yezir  or  ezu-lier,  have  now  appeared  or  been  accepted  for 
journals.  These  include  Azar  [1991a,  b],  Dolev  et  al.  [1991],  and  UUman  and  Yannakakis 
[1991]. 
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4.  TRANSITIONS  AND  DOD  INTERACTIONS 

We  note  the  survey  By  Goldberg,  Tardos,  and  Tarjan  [1990].  Also,  articles  by  Gibbons 
and  by  Ullman,  for  the  text  on  parallel  algorithms  edited  by  John  Reif,  which  we  reported 
last  year,  is  expected  to  reach  the  bookstores  by  the  end  of  the  year. 

Andy  Freeman  has  discussed  his  architectural  proposals  with  representatives  of  IDA. 
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5.  SOFTWARE  AND  HARDWARE  PROTOTYPES 

None  at  a  distributable  stage.  Andy  Freeman  has  an  extensive  simulator  for  micropro¬ 
cessors,  and  Andy  Goldberg  has  been  working  on  code  for  pairallel  flow  on  a  connection 
machine,  in  connection  with  a  competition  for  algorithms  of  this  type. 
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